Cache

  • Modern CPUs fetch memory in 64-byte cache lines and prefetch adjacent memory.

Problems with OOP

  • Not cache friendly. Each a->b  requires 3 memory accesses. One for a , other for the vtable , other for b . It would be much better if a[i] , so only one memory access is required.

    • See the talk from Sergiy Migdalskiy where he goes further into this.

  • Thought exercise:

    • A Creature  is an Entity , where Entity  defines function pointers ( .process_physics ) for specialized behavior, so a creature could call for example bat.process_physics ; it's basically a method.

    • Problems:

      • Intrinsic slower performance.

      • Looping over an array of entities while only affecting one "specialization" (looping over bats) would be bad, as:

        • An Entity  doesn't store a "specialization" enum, so there's no way to know which entity belongs to that specialization.

        • Even if Entity  would store an enum (or similar) to indicate the "species variant", this is still not cache friendly at all, as the whole array with lots of entities that don't belong would have to be loaded into cache. Instead of getting the bats right away, it would be necessary to loop over all entities and branch out using a switch statement for the "species variant"; this sounds terrible for cache.

Performance: Fixed Array (Small Array) vs Dynamic Array

Discussion
  • Considering the struct IK_Chain :

    IK_Chain :: struct {
        joints: [dynamic]^Joint,
        bone_lengths: [dynamic]f32,
        target: eng.Transform_Node,
        is_target_moving: bool,
        placement: IK_Placement,
    }
    
  • A question about performance: I'm storing bone_lengths: [dynamic]f32  as a cache inside a IK_Chain . I opted for a [dynamic]  array as I don't know for sure what the length of a chain will be. Tho, now knowing about the existence of a Small_Array , I question whether I should make this a bone_length: sarray.Small_Array(SOME_REASONABLE_NUMBER, f32) , for cache locality. Seems like a trade-off between memory and speed, as by using a Small_Array I will overestimate the size of this array, to be sure to fit into all major cases of a IK_Chain  I would build. For context, now a IK_Chain  only has 4~5 Joints, but it could have 20+ or more, for some creatures. I update the IK_Chain  every frame, for visuals. So, using a small array with ~20 cap would be a nice trade off, instead of using a dynamic array? Oh, probably relevant, but using this for bone_lengths would also imply that I would also use this for the joints: [dynamic]^Joint  in the IK_Chain ; consider that the ^Joint are on the stack.

  • TLDR :

    • SmallArray wins because the entire struct + bone data often fits in 1-2 cache lines

Location and continuity
  • Dynamic Array :

    • A [dynamic]f32  stores its metadata (pointer, length, capacity) in the struct, but the actual data is heap-allocated.

    • When you do make([dynamic]f32, 0, capacity) , Odin allocates a contiguous block of memory on the heap. All elements are stored sequentially in this block:

    • [elem0][elem1][elem2]...

    • As long as you don't exceed this, you're going to have the values in a fixed contiguous region.

    • Indirection / Pointer chasing.

  • Small_Array :

    • Is embedded directly in the struct (whether the struct is on stack/heap depends on context).

    • Elements are also contiguous, but embedded within the parent struct.

  • Example :

    • Ex1: Array of structs :

      chains: [100]IK_Chain
      
      • Dynamic Array :

        • Only struct metadata is contiguous. Actual data is fragmented:

          [Chain0] → (heap_ptr0 → bones0)
                    → (heap_ptr1 → joints0)
          [Chain1] → (heap_ptr2 → bones1)
                    → (heap_ptr3 → joints1)
          
      • Small_Array :

        • All data (struct fields + bone lengths + joints) is in one contiguous memory block.

          [Chain0][bones0][joints0][Chain1][bones1][joints1]...
          
    • Ex2: Single struct :

      • Dynamic Array :

        • Requires at least two separate memory loads:

          • Load IK_Chain  struct (metadata)

          • Load bone data from heap pointer

      • Small_Array :

        • All data loaded in 1-2 cache lines.

  • They are both backing arrays fixed at point in memory and should benefit from caching.

  • "What about the joints ?" :

    • The [dynamic]^Joint  can be problematic due to double indirection (pointer to array + pointer to Joint).

      • This is worse than just dynamic arrays of values.

Dynamic Array reallocation
  • Reallocations are rare if capacity is preset, but when they occur:

    • Invalidates all caches for bone data.

    • Costs CPU cycles for allocation/memcpy.

  • A Small_Array avoids this entirely.

Deciding between memory efficiency and performance
  • Considerations :

    • If you only need AI data in one system and skeletons in another → Cache locality benefits diminish.

    • If you allocate for 50 creatures but typically have 10-15 → 70-80% memory wasted.

    • Bad if creature size > cache line (typically 64-128B).

  • DO use Small_Array for :

    • Core metadata (position, health).

    • Hot components (AI state, IK chains).

    • Fixed-size subsystems.

  • AVOID Small_Array for :

    • The entire Creature struct.

    • Large/variable data (animations).

    • The creatures container itself.